Picture for Bingyi Kang

Bingyi Kang

$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens

Add code
Feb 20, 2025
Viaarxiv icon

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Add code
Jan 21, 2025
Viaarxiv icon

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Add code
Jan 16, 2025
Viaarxiv icon

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models

Add code
Dec 18, 2024
Viaarxiv icon

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Add code
Dec 18, 2024
Viaarxiv icon

Image Understanding Makes for A Good Tokenizer for Image Generation

Add code
Nov 07, 2024
Viaarxiv icon

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution

Add code
Nov 04, 2024
Figure 1 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 2 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 3 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Figure 4 for DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Viaarxiv icon

How Far is Video Generation from World Model: A Physical Law Perspective

Add code
Nov 04, 2024
Figure 1 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 2 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 3 for How Far is Video Generation from World Model: A Physical Law Perspective
Figure 4 for How Far is Video Generation from World Model: A Physical Law Perspective
Viaarxiv icon

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Add code
Oct 03, 2024
Viaarxiv icon

Depth Anything V2

Add code
Jun 13, 2024
Figure 1 for Depth Anything V2
Figure 2 for Depth Anything V2
Figure 3 for Depth Anything V2
Figure 4 for Depth Anything V2
Viaarxiv icon